COMPARATION BASED BOTTOM-UP AND TOP-DOWN FILTERING 
MODEL OF THE HIPPOCAMPUS AND ITS ENVIRONMENT 
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Abstract. Two rate code models - a reconstruction network model and a control model 
- of the hippocampal-entorhinal loop are merged. The hippocampal-entorhinal loop 
plays a double role in the unified model, it is part of a reconstruction network and a 
controller, too. This double role turns the bottom-up information flow into top-down 
control like signals. The role of bottom-up filtering is information maximization, noise 
filtering, temporal integration and prediction, whereas the role of top-down filtering is 
emphasizing, i.e., highlighting or 'paving of the way' as well as context based pattern 
completion. In the joined model, the control task is performed by cortical areas, whereas 
reconstruction networks can be found between cortical areas. While the controller is 
highly non-linear, the reconstruction network is an almost linear architecture, which is 
optimized for noise estimation and noise filtering. A conjecture of the reconstruction 
network model - that the long-term memory of the visual stream is the linear feedback 
connections between neocortical areas - is reinforced by the joined model. Falsifying 
predictions are presented; some of them have recent experimental support. Connections 
to attention and to awareness are made. 
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1. Introduction 



Ever since the discovery of the central role of the hippocampus and its adjacent 

numerous studies and mod- 



areas in memory formation Sidman et al., 1968 Milner, 1972 



els dealt with the properties and the possible functions of the hippocampus and its en- 
vironment. The number of new experimental findings is increasing and highlight the 
complexity of the behavior of memory. Although views are strikingly different, they 
seem to have their own, experimentally supported merits. The interested reader is re- 
ferred to the literature for excellent reviews on the hippocampus, e.g., [Squire, 1992| , 
or [Hasselmo and McClelland, 1999 and O'Reilly and Rudy, 1999| . The majority of the 
models have been developed to describe one part (mainly the CA3 field) of the hippocam- 
pus (see, e.g. Levy, 19961 Kali and Dayan, 2000| and references therein). Some recent 



models have made attempts to develop an integrating model of the HC [Rolls, 1989 



Hasselmo et al., 1996[ Lisman, 1999[ [Eic henbaum, 2 000l Hasselmo et al., 2002[ and see 
also the collection of theoretical papers .Gluck, 199^ . It is known though, that hip- 
pocampus is deeply embedded in the neocortical information flow through the en- 
torhinal cortex (EC). This fact explains the emergence of a few EC-HC models like 
McClelland et al., 1995] [Myers et al., 1995 [ [Rolls, 2 000 | . Embedd ing is justified in most 



of them. For example, McClelland et al. ( [McClellanc^t al., 1995[ ) emphasizes the neces 
sity of a dual system for the seemingly contradictory tasks of learning specifications and 
allowing for generalization. 

The computational model that we present here, has its origin in the old stand- 
ing proposal that the hippocampus and/or its environment serve as a 'comparator' 
Grastyan et al., 1959 Sokolov, 1963[ Vinogradova, 1975 . There are more recent works 
along this subjects. Oftentimes models use somewhat different nomenclature, e.g., the 



focus is placed on match/mismatch detection Ranck Jr., 1973 O'Keefe and Nadel, 1978 
Grossberg, 1982[ . Match/mismatch detection is closely related to familarity /novelty de- 
tection, another direction of theoretical efforts to describe medial temporal lobe ar- 
eas lotto and Eichenbaum, T9921 [Rolls et al ., 19931 [Wiebe and Saubh, 1999 



precise distinction between orienting, salient and novel stimuli 



Note that 
is not an easy matter 



Rugg, 1995 



This work is about the 



based models: the control 



merging two comparator 
model of the entorhinal-hipp ocampal loop | L 6rincz, 1998 and the reconstruction net- 
work model of the same loop Lorincz and Buz'sa^i^OOO[ . Both models have their own 
merits. For example, there is a large body of experimental data supporting the idea 
that attention shapes (influences, controls) perception. For excellent reviews, see, e.g., 
[Duncan, 1999| Posner and DiGirolamo, 2000|[LaBerge, 2000[ and references therein. The 
control model involves the comparator function, because - by construction - control con- 
cerns the difference between desired and actual parameters. On the other hand, the recon- 
struction network is also a comparator: it has a hidden layer, works as an auto-associator, 
and compares input to the auto-associated reconstructed input. It turns out that the re- 
construction network is an appealing structure for experience based optimization of noise 
filtering Lorincz et al., 2002b[ . We shall merge the two comparator structures and shall 



map the merged structure to the entorhinal-hippocampal loop and its environment. This 
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merging will enable us to make physiological predictions concerning persistent activities, 
delay properties, long-term memory and statistic versus one-shot learning. 

There are two approaches, which should be mentioned, because both of them find 
their place in the present model. Gluck and Myers [Gluck and Myers, 1993| have de- 
signed a model to perform reconstruction and classification together for modeling some 
properties of the hippocampus. Rao and Ballard |Rao and Ballard, 1997[ Rao, 1999 



Rao and Ballard, 1999 have suggested an integrating model of the visual stream by ex- 



ploring a Kalman-filter analogy to cope with the input and system uncertainties (treated 
as noise) and presented a hierarchy for error correction and prediction using top-down 
inference from higher levels. Kalman-filter is a kind of reconstruction or generative net- 
work, which uses an internal representation to generate expected inputs. The mapping of 
the proposed function onto the anatomical substrate has remained elusive. The Kalman- 
filter model, which is an approximation to our model, has been criticized because such 
recurrent loop structures are slow for feedforward processing found in neocortical areas 
Koch and Poggio, 1999 . Our model resolves this problem. 



Mathematical theorems and numerical studies concerning individual components 
and combinations of those have been presented elsewhere, or have been made avail- 
able in the form of technical reports. For example, hierarchical reconstruction net- 
works [Lorincz et al., 2002b| , the dynamics of the network as well as the order of learn- 
ing in recon struction networks [Lorincz et al., 2002c , implicit memory phenomena, such 
as priming [Lorincz et al., 2002c| and category formation in reconstruction networks 
Keri et al., 2002| are provided in the cited papers. Numerical studies concerning the con- 



trol architecture can be found in [Szepesvari et al., 1997[ Szepesvari and Lorincz, 1997a 



Szepesvari and Lorincz, 1998 . Mathematical considerations as well as numerical stud- 
ies of the control architecture embedded into the reinforcement learning framework 
have been thoroughly described in [Szita et al., 2003 . Connections to reinforcement 



learning, the key to fast, possibly one-shot memory encoding have been presented in 
Kalmar et al., 1998[ Szita et al., 2003 as well as in [Szita and Lorincz, 2003| . The math- 



ematical considerations have been complemented by some generalization concerning the 
control architecture in order to meet the requirements posed by the merging of the two 
architectures. This slight mathematical generalization can be found in the Appendix of 



a technical report Lorincz, 2003 . The novelty of the present work is in the merging of 
the two models and in the description how the two models can be merged. We shall find 
that stability properties of the controller are improved by the merging: The reconstruction 
network filters the input noise of the controller. (Considerations on noise sensitivity of the 
controller can be found in [Szepesvari and Lorincz, 1997a[ .) Here, the model is described 
by words and in figures. The interested reader may wish to consult the cited works about 
the mathematical details. 

In what follows, first, the terminology and the preliminaries are reviewed (Section 
[^J . In Section [3 the merging of the control and reconstruction architectures into a sin- 
gle building block of a hierarchical structure is described. The closure of the hierarchy 
provides the view that the hippocampus plays a double role; it is part of a controller 
and contributes to a reconstruction network, too. Section deals with the mapping of 
the control and reconstruction architectures to the entorhinal-hippocampal loop and its 
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environment. Physiological properties captured by the model as well as some falsifying 
predictions are listed and explained in this section. Conclusions are drawn in Section |21 



2. Notations and preliminaries 



output(s) output(s) 

t 



output(s) output(s) 




w 



input(s) input(s) 
(C) (D) 



Figure 1. Notations 

A: Control representation of input output systems. Computations are per- 
formed in the box. 

B: Neural network representation of computations: inputs are received by 
input neurons and are (non-linearly) transformed by connections and the 
output neurons, which provide the outputs. An output neuron could be an 
input neuron of the next processing stage. 

C: Linear neural transformation. Input: x, transformation W, output: y, 
y = Wx. 

D: Non-linear neural transformation, y = /(Wx). Graphical form: ar- 
row with a circle. More than one transformation may exist between layers. 
Recurrent network is a neural layer with a transformation that targets the 
same layer. 

Terminology in the context of neurobiology: 

Layer is encompassed by an area (e.g., typical neocortical areas are made 
of 6 layers), or it is a subfield, such as the CAS and CAl regions of the 
hippocampus. Transformations may correspond to (i) excitatory synapses 
connecting layers or targeting neurons of the same layer, such as the recur- 
rent collaterals and the associative connections of the CAS subfield of the 
hippocampus and the intra-layer excitatory connections of layers II and III 
of the neocortex or (ii) inhibitory synapses between layers or within layers, 
such as the rich interneural networks in the hippocampus. 



Notations of the control field and notations of neuron networks differ. In control 
theory, input-output systems are considered. In graphical form, box denotes the system 
and arrows denote the system's input(s) and output(s). Processing occurs in the box 
(Fig. n^A)). On the contrary, artificial neural networks consist of computational units, 
the putative analogs of real neurons. The units, also called neurons, receive (provide) 
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inputs (outputs) through the connection structure, and this internal functioning is drawn 
exphcitly. Neurons execute simple computations, like summing up inputs, thresholding 
and alike. The main part of the neural network performs distributed computation using 
the connection structure performing (non-linear) filtering. This distributed filter system, 
which may connect all neurons, is called the connection system, weights, or synapses. 
In a neural network architecture, different neural layers are distinguished. Connections 
between these layers are explicitly drawn in most cases (Fig.^B)). Computations of neural 
networks between their inputs and outputs can be given in the following condensed form: 



(1) y = /(Wx) 

where input and output are denoted by x G M" and y € M™", respectively, linear trans- 
formation from M" to is represented by matrix W € M*"^", the connections, and 
function / denotes a component-wise non-linearity. If this function is the identity func- 
tion, then we have a linear network. Here, a simplified notation will be used: neural layers 
will be denoted by horizontal thick lines. Any particular set of connections between two 
layers will be represented by a single arrow. A feedforward linear network is depicted in 
Fig. n^C). The graphical form of a network with component- wise non-linearity is shown 
in Fig. ^D). Different transformations may exist between two layers. Recurrent connec- 
tions (also called 'recurrent collaterals') target the same layer where they originate from. 
Equation n simplifies to the usual input-output mapping of a single neuron for m = 1. 

2.1. Preliminaries. 

Controller. Our control model is formulated in terms of state dependent directions point- 
ing towards target positions. A mapping which renders direction (change of state, or 
change of state per unit time, i.e., velocity) to each state is called speed- field. A particular 
speed-field is given, for example, by the difference vectors between the target state and 
all other states. An important feature of speed-field is that motion is not specified in 
time. The control task is defined as moving according to the speed-field at each state. 
This control task is called speed-field tracking (SFT). For a review on SFT, see, e.g., 
[Hwang and Ahuja, 1992 . The control task of path (also called trajectory) tracking is 



different from SFT. This difference is illustrated in Fig. 12 One might say that SFT is less 
stringent and puts more emphasis on the global goal than on the tracking of a prescribed 
trajectory. 

The dynamic equation of a system is a (possibly continuous) set of differential 
equations. This set of equations determines the change of state per unit time x f« 
given the state of the plant (also called the system under control) x G M"- and the external 
forces acting upon the system, including the control action u G MP. Let 

(2) x = f(x,u) 

denote the dynamics of the system under control. This is what the system does. 

Inverse dynamics works in the opposite way: given the state and (desired) change 
of state, inverse dynamics provides the control vector. The controller, in turn, maps state 
and speed to control action. Let v(x) G denote the desired change of state (the desired 
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Figure 2. Tracking and inverse dynamics) 

A: TT: Horizontal lines represent different nearby trajectories in 2 dimen- 
sions. Black line: initial trajectory. Dotted line: perturbed trajectory. 
Upon perturbations the system (the 'plant') should return to the original 
trajectory, which is also specified in time (not shown). 
SFT: Horizontal lines represent a small homogeneous part of a speed-field 
v(x) to be tracked embedded into a 2 dimensional space x . Black line: ini- 
tial speed trajectory. Dotted line: perturbed motion. Upon perturbation, 
the plant adjusts its speed to the prescribed speed in its actual neighbor- 
hood. 

B: Inverse dynamics produces a control vector for a given pair of state and 
the belonging desired speed. 



momentum). Assume that we have an approximate feedforward model of the inverse 
dynamics: 

(3) Uff = u//(x,v(x)) 

(4) = A(x)v(x) + b(x) 

where A(x) € R"^", b(x) G M". Equation El represents an input-output system receiving 
the state and the desired momentum, which is denoted by v(x) and providing output (the 
control vector ujj : M"^" — > M^'). This control vector could be used directly to influence 
the plant and it is called feedforward controller. If the inverse dynamics is perfect then 
using this control vector directly, that is inserting this control vector into the dynamic 
equation x = f(x, u), the desired change of state is achieved: The perfect feedforward 
control vector u^r makes the plant to produce momentum v(x): 



(5) 



v(x) = f(x,u*-..(x,v(x))) 
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Figure 3. Robust controller for speed-field tracking tasks 

A: The model of the inverse dynamics is inputted by the actual state x 
and the desired momentum (or desired speed vector) v(x). The output 
of the model is the feedforward control vector Ufj = ujj(x,v(x)). The 
feedforward control vector may need corrections. The feedback control 
vector is the difference between the feedforward control vector and the 
experienced control vector ujj(x, x): uj^ = ujj(x,v(x)) — ujj(x, x). 
B: The static and dynamic state (SDS) feedback controller. The output of 
the feedback controller is (i) applied directly and (ii) it is integrated by time 
multiplied by the gain factor and the result is added to form the almost 
correct control vector u* . Differing texture of the two feedback controllers 
denotes that computations in these input-output devices can differ as long 
as certain mathematical conditions concerning the sign of the components 
of the control vectors are met. 



If the feedforward control vector is imprecise then error (a difference between the 
desired momentum and the experienced momentum) Cc = v(x) — x appears. To correct 
this error, the (same or another) model of the inverse dynamics can be used. The error 
correcting controller is called feedback controller [Szepesvari et al., 1997 : The output of 
the feedback controller uj;, 

(6) Ufh = u//(x, v(x)) - u//(x,x) 
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can be temporally integrated 

(7) w(t) = f wdt= I Aufbdt 

J —oo J —oo 

and the two terms, together: 

(8) u* = Uff, + w 
provide an globally stable and almost perfect controller [Szepesvari and Lorincz, 1997a 



Lorincz, 2003 . This scheme is depicted in Fig. 01 Note that Eq. |3| allows one to write u fi, 

as 

(9) u^, = A(x)(v(x)-x) 

The controller allows for temporal changes of the non-linear terms A(x) and b(x), a rare 
property in the control literature. The condition of this property make the working of the 
architecture strongly non-linear [Lorincz et al., 2001j and will be discussed later. Also, 
the two feedback controllers of Fig. OtB) can differ [Lorincz, 2003[ . 

(10) w = AA(x)(v(x) -x) 

(11) u* = A(x)(v(x) -x) w 

where A can be different from A. Both controllers operate by comparing 'desired' 
and 'experienced' quantities. Under certain conditions, global stability is reached by 
the two controllers. The proof relies on an extension to Ljapunov's second method 
Szepesvari et al., 1997 Szepesvari and Lorincz, 1997a[ Szepesvari and Lorincz, 1997b 



ormcz^OOSf ! The controller architecture of Fig. [HI is called the static and dynamic 



state feedback controller, or SDS controller, for short. 
Reconstruction network. 

Basic reconstruction network. The basic reconstruction network (Fig. 0fA)) has two 

layers: the reconstruction error layer and the hidden layer. The reconstruction error com- 
putes the difference (e G M*^) between input (x E W) and reconstructed input (y G W~): 
e = X — y. Reconstructed input y is produced by the hidden (internal) representation 
h G M'^ via top-down transformation Q where Q G W'^^ and r < s or r > s are both 



possible at the expense of some mild non-linearities (see, e.g., Olshausen and Field, 1997 
and references therein). The hidden representation is corrected by the bottom- up trans- 
formed form of the reconstruction error e that is by We (where W G M^^''). The process 
of correction means that the previous value of the hidden representation is to be main- 
tained and the correcting amount needs to be added. In turn, the hidden representation 
has a self-excitatory connection set, which maintains the activities. Correction occurs in 
time and this (continuous or discrete) temporal collection of correcting terms is denoted 
by J dt. For sustained input x the iteration will stop when Qh = x or when the bottom- 
up transformed values of Qh and x are equal: WQh = Wx. In turn, for each input, 
the hidden representation is determined by top-down transformation Q. The bottom-up 
transformation can restrict the range of the reconstruction. For example, if W projects 
to a subspace then reconstruction can be executed only within that subspace. 
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(A) (B) 

Figure 4. Reconstruction networks 

A: Simplest reconstruction network, x and y: input and reconstructed 
input, respectively, W and Q: bottom-up (BU) and top-down (TD) trans- 
formations, respectively. J dt: summation in (or integration by) time. 
B: Reconstruction network capable of noise filtering and pattern comple- 
tion, s: BU processed reconstruction error with maximized information, 
h: activity vector of the hidden (model) layer, N: transformation from 
BU processed reconstruction error layer to the model layer. P: linear BU 
transformation followed by sparsifying thresholding. Thresholding gates 
components that deliver noise, but no information. For a perfectly tuned 
network, P = W, and either QNW = I (s < r) or NWQ = I (r < s). 
M: recurrent collateral system for temporal integration and pattern com- 
pletion. Dark gray inset: B and v are tools of control. The controller 
provides the temporal derivative of the control vector Bv, where v is the 
desired momentum of the controller. The derivative Bv is integrated at 
the internal representation. 



For s < r, we call the network perfectly tuned if W = (Q"^Q)^^Q^, i.e., if WQ = I 
(I G R*^*). In this case, activities of the hidden layer become perfect after a single bottom- 
up processing step and the network works alike to feedforward nets. 

Reconstruction network plus. The simple network of Fig. IHA) can be extended to 
support noise filtering. To this purpose, the reconstructed input vector y is represented 
separately. Input vector x is compared to the reconstructed input y in the error layer: 
e = X — y. The error is transformed by the bottom- up (BU) transformation matrix W and 
forms the BU transformed error s. BU transformation maximizes BU information transfer 
in order to facilitate reconstruction. BU transformed error is passed to the internal repre- 
sentation layer through transformation matrix N (the role of this transformation shall be 
discussed later) and is added to the internal representation h of the 'hidden' (or model, or 
internal representation) layer. The activity of the hidden layer is maintained by diagonal 
elements of recurrent matrix M, which - beyond temporal integration - can support cat- 
egory formation |Keri et al., 2002] as well as temporal prediction Rao and Ballard, 1997| 
via its off-diagonal elements. 
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The network of Fig. El^B) works as follows. Off-diagonal elements of recurrent 



matrix M of the hidden layer perform pattern completion Lorincz et al., 2002b and 



temporal prediction Rao and Ballard, 1997 . The newly introduced layer denoted by 



s (s E R^) will be called the ICA layer. This ICA layer plays a role in noise filter- 
ing. There are two different sets of afferents to the ICA layer: one is carrying the 
BU transformed error, whereas the other carries information about the reconstructed in- 
put y via bottom-up transformation P, followed by a non-linearity that removes noise 
via thresholding. Thresholding is alike to wavelet denoising Mallat, 1998 , with the 



exception that the filters are not necessarily wavelets but are optimized for the input 
database experienced by the network. Optimization makes use of independent compo- 
nent analysis (ICA) [Jutten a nd Herault, 1991, Comon, 1994, Laheld and Cardoso, 1994 



Bell and Sejnowski, 1995| |Amari et al., 1996t |Karhunen et al., 1997t |Amari, 1998] . ICA 



makes the assumption that input is generated by statistically independent sources 

(12) x = Cr + u 

r 

(13) P{ri,...,rr) = llP{rk) 

k=i 

where G denotes additive Gaussian noise, r G R*" denote the original sources and 
C G W"^''' is called the mixing matrix. The ICA algorithm intends to reproduce the 
original sources and minimizes mutual information between ICA transformed compo- 
nents. The multiplication of vector r by matrix C, makes the identification of matrix 
C ill posed: The problem becomes well posed, by constraining the variances of the 
searched components. For example, variances can be constrained to one. In this case, 
minimization of mutual information is equivalent to maximizing the sum of the negen- 
tropies (the non-Gaussian character) of uncorrelated estimates [Hyvarinen et al., 1999 



This feature enables the local estimation and local thresholding of noise components 
Hyvarinen et al., 1999| [Hyvarinen, 1999 , where locality means that noise of each com- 



ponent of the ICA layer can be thresholded independently of the value of other ICA 
components. The method is called sparse code shrinkage (SCS) and the process is referred 
to as sparsification. 

Thus, s is the ICA transformed and sparsified reconstruction error. Note however, 
that sparsification concerns the components of the BU transformed reconstructed input: 
high amplitude components of Py (BU transformed reconstructed input) open the gates 
of components of the ICA layer and ICA transformed reconstruction error can pass these 
open gates to correct the internal representation. Low amplitude components of P trans- 
formed reconstructed input can not open the gates and corrections of these components are 
rejected, unless the correction themselves are large enough to overcome the sparsification 
process. 

Apart from sparsification, the reconstruction network is a linear network. In what 
follows, we shall denote this property by the notations sign A ^ B means that 

for a well tuned system and up to a scaling constant (or scaling matrix) quantity A is 
approximately equal to quantity B. 
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For a well tuned network and if matrix M performs temporal integration (i.e., if 
matrix M does not perform temporal prediction), s ~ x by construction. If matrix M 
performs temporal prediction and the network is perfectly tuned, then s ~ x. These 
quantities could be processed and represented at higher layers. For the sake of simplicity 
of considerations, assume that matrix M performs temporal integration and nothing else. 
Then we can approximate the output of layer s as a noise filtered linearly transformed 
(i.e., filtered and scaled) form of x.^ We shall further simplify the notation. The output 
of the ICA layer will be denoted by ~ x, where ~ x means 'the scaled version' of x. In 
a similar vein, internal representation h y. Also, we shall use the notation y ~ x, 
although the former is the noise filtered version of the latter. 

Note that considerable reconstruction error can build up, e.g., by top-down in- 
fluence. Considering longer temporal scales, then - by construction - y is equal to the 
temporal integral of the noise filtered reconstruction error: y = / Gfdt where ej denotes 
the noise filtered reconstruction error. This feature will be exploited later. 

3. The joined model 

3.1. Controlled reconstruction network. The reconstruction network can be con- 
trolled (Fig. [5I^B)). Control adds extra contribution to the hidden layer, namely Bv. The 
'dot' on V is the consequence of adding the contribution to the internal representation 
where temporal integration occurs. In turn, the action of the controller is equal to the 
temporally integrated value of the extra contribution, that is Bv. 

To achieve approximately perfect controlling, we shall make use of the SDS con- 
troller. The controller is made of another ('higher') reconstruction network, which receives 
input from the network under controlled (the 'lower' network). This input is equal to the 
output of the ICA layer of the lower network. Control works by subtracting the desired 
speed from the input of the higher network. That makes the input to the network equal 
to X — v(x).^ By construction, (i) the input is noise filtered and reconstructed at the 
reconstructed input of the network and (ii) apart from the noise content, the error layer 
approximates the temporal derivative of the reconstructed input (Fig. ElB)). These two 
differences undergo linear transformation and enter the internal representation layer of the 
lower network, where - by construction - they add up and undergo temporal integration. 
This is exactly what is needed for the SDS controller. Moreover, noise filtering of the re- 
constructed input is of particular importance, because noise entering temporal integration 
seems to be the weakest point of the controller [Szepesvari and Lorincz, 1997a| . 

Learning to control in the SDS scheme is relatively simple. Roughly speaking, 
learning is sufficient if the signs of the components of the control vector and the domains 
where these components should not change sign have been determined. This is why the 
negative negative sign of x — v(x) (instead of v(x) — x) does not count: signs of components 
of matrix B need to be determined by learning. SDS warrants that if the signs of the control 

-'^Extension to spatio-temporal pattern completion is straightforward. Redefining the state as the con- 
catenation of X and x and speed as the concatenation of x and x gives rise to the same formalism that we 
are using here [Szepesvari and Lorincz, 1997a| . 

Note the negative sign of this difference that we shall discuss in the next paragraph 
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(A) (B) 

Figure 5. Controlled networks 

Notation: ~ z: scales (approximately) linearly with z. 
A: Controller receives the state (~ x) and the speed (~ x) from the recon- 
struction network under control as well as the desired speed (v(x)) from 
somewhere else and forms the difference between them. 
B: Reconstruction network acting as a controller. Ai and A2 carries in- 
formation about the state and the speed of the network under control, 
respectively. Difference between speed and desired speed is the input to 
the network. The noise filtered and reconstructed value of this difference 
appears at the reconstructed input. By construction, the reconstruction 
error layer holds - the approximate - temporal derivative of the recon- 
structed input. These differences are turned into feedback control vector 
Ufb and its temporal derivative iifb by means of transformations B2 and 
Bi, respectively. The outputs of these transformation are temporally in- 
tegrated at the internal representation of the lower network to form the 
components of the SDS controller. They add up here and provide almost 
perfect and stable control. 



components are proper, then control will be globally stable and approximately perfect 
Mathematical details of sign-properness can be found elsewhere [Szepesvari et al., 1997 



Szepesvari and Lorincz, 1997a[ Lorincz, 2003 . Numerical demonstrations using coarsely 



tuned but sign-proper controllers have been provided in [Lorincz et al., 2001| . 

The hierarchy is highly non-linear because of several reasons. The higher the recon- 
struction network in the hierarchy, the higher the order of the dynamic properties of the 
environment that are learned and represented by it. Also, the condition of robust control 
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concerns the shattering of the space to sign-proper domains and the dynamic contribution 
of the controher can be highly non-hnear within sign-proper domains. Switching between 
domain is highly non-linear, too. Lastly, sparsification is another source of non-linear 
processing. 

Another note concerns the result of controlling. The SDS controller warrants that 
the desired quantity is closely approximated under the condition that the system is con- 
trollable. Until this point control concerned a lower reconstruction network, which may 
receive input from the environment (Fig. [SJ^B)). Conditions of the SDS controller are not 
fulfilled unless control action also concerns this input, or if this input is zero. In the former 
case, control of the environment needs to be included into the architecture. The latter case 
corresponds to the absence of external inputs, such as pattern completion or dreaming. 

The perfectly tuned architecture behaves as a bottom-up feedforward network, 
which is biased by top-down influence. Consider the lower reconstruction network of 
Fig. EjB). The bias will modify the internal representation of the lower reconstruction 
network, which may or may not fit the input from the environment. If it fits, then any 
reconstruction error disappears quickly. In case if it does not fit, large reconstruction error 
at the reconstruction error layer may build up, but only a small portion of this error can 
pass the sparsification process at the ICA layer. This is because sparsification is ruled 
by the internal representation generated reconstructed input. That is, information that 
matches the context of the higher reconstruction network will be able to pass sparsification, 
whereas other information will be filtered out, or attenuated by the bottom-up sparsifi- 
cation process. In turn, top-down influence 'paves the way' of some of the components. 
The process of filtering out certain components of information and paving the way of the 
others can be seen as attention is paid to the latter components. 

3.2. Closing the loops of the hierarchy. The top of the hierarchy has a special role. 
At the top, the sensory bottom-up information flow should be turned into context like 
top-down information, which can enforce, shape, or influence (say, control) lower rep- 
resentations. This reversal of the direction of information flow becomes possible if the 
reconstruction error feeds the internal representation of a lower reconstruction network. 
In this case, we shall have a network with a reconstruction error layer and an internal rep- 
resentation, which when put together, make up a reconstruction network. On the other 
hand, we have also a controller, because error at a higher level guides the internal repre- 
sentation of a lower level. The idea is illustrated in Fig. El^A). The figure uses the same 
gray coding that Fig. El dark and light gray areas correspond to reconstruction and control 
networks, respectively. There are two dark and two light areas. All of the components 
are depicted, but some of them are denoted by lighter dashed lines. These connectivity 
structures and neural layers are missing at the top. Notation corresponds to the cortical 
layers. Layers x, y, s, and h represent superficial layers II, III, granular layer IV, and deep 
layers V-VI, respectively. The highest layer is denoted by HC, the short for hippocampus. 
The upper control layer is denoted by EC, the short for the entorhinal cortex. 

The following differences are to be noted. First, there is no top-down connection 
from EC layers II and III to EC the deep layers. In turn, these layers can not exert 
control action onto the corresponding internal representation. Instead, the HC is in the 
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Figure 6. The top of the hierarchy 

A: HC: hippocampus. II-III: superficial layers. IV: granular layer. V- 
VI: deep layers. The top turns information flow from bottom-up direction 
to top-down direction: HC acts on the internal representation of a lower 
reconstruction network. HC together with the layers encompassed by the 
grey box indexed by EC, i.e., the entorhinal cortex, which has no granular 
layer, form a reconstruction network. 

B: Mapping of the top to the EC-HC loop. ICA is made in two steps: (i) 
whitening (CAS subfield of HC) and (ii) separation (CAl subfield of HC). 
Blind source deconvolution, the putative role of the dentate gyrus, removes 
temporal correlations. Reconstruction error is computed at EC layer II. 
Reconstructed input: EC layer HI. Internal (hidden) representation: EC 
layers V and VI. Temporal integration (persistent activities) and possibly 
pattern completion is the putative role of the recurrent collaterals of EC 
layers V and VI. EC afferents of area CAl: denoising. 
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position to exert control action: it receives the reconstruction error from the lower layer 
and acts upon the hidden representation of that reconstruction network. On the other 
hand, consider the connectivity of the deep layers of the EC, the superficial layers of the 
EC and the HC. These structures, together, form a reconstruction network. It can be 
easily verified by shifting the deep layers of the EC next to the HC in Fig. EfA). In turn, 
the HC and its environment has a double role, it is a control network and a reconstruction 
network. The control network acts upon a lower internal representation that reverses the 
bottom-up flow of sensory information into top-town control making use of the context 
of higher reconstruction networks. Using the concepts of the controller hierarchy, higher 
order dynamical information corrects lower order approximations. 

Note that information flow from the lowest layers (s and h passes through the 
ladder of two reconstruction error layers (superficial layer II). This feature can not be 
explained within the framework of the model. It allows us to pin-point to the limitations 
of our modelling efforts. The functional model can not explain the recurrent collaterals 
of the superficial layers, a prominent structure of neocortical regions. This connectivity 



is thought as the extension of the associative cortices (see, e.g., [Diamond, 1979| and 
references therein). Our model suggests that two of such layers can be seen as a single 
but larger layer, which is in agreement with the functionality suggested by Diamond. 

Detailed description of the HC 

has been provided elsewhere Lorincz and Buzsaki, 2000| Lorincz et al., 2002b and will 
be reviewed in the next section. 



4. Discussion 



Our first note concerns Adaptive Resonance Theory (ART) pioneered by Grossberg 
and colleagues [Grossberg, 1980| [Carpenter and Grossberg, 1987[ 

Raizada and Grossberg, 2003[ . ART proposes that sensory processing is two-folded: It 



is made of bottom-up filtering as well as of top-down template matching. The underlying 
mechanism of ART - namely, the resonant circuitry - differs from the two-folded com- 
parator function embodied by our architecture. Mapping to the neurobiological substrate 
Grossberg and Carpenter, 1993 Raizada and Grossberg, 2003[ is, in turn, different from 



ours that we shall present below. 

Our modelling efforts have certain particular properties: We have started from func- 
tional hypothesis about the importance of control and noise filtering and have introduced 
a hierarchical system, which should be capable to do both. At each step, mathematical 
tools were used to restrict our freedom. Possible solutions, which were untractable from 
the point of view of stability and noise filtering, or, alternatively, which could not be 
mapped to the anatomical structure, or did not fit known physiological properties and 
known results of computational neuroscience were dropped Lorincz et al., 2000 . We call 



this function based, structure constrained and mathematics supported effort, Ockham's 
modelling Lorincz et al., 2001[ Lorincz et al., 2002a, . ART does the same. Our com- 



parator model is different, because it starts from control principles control principles and 
assumes the universality of the comparator hypothesis. For a control network, comparison 
between desired and experienced quantities seems reasonable. 
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The model offers falsifying predictions; i.e., predictions which could constrain or 
defeat the model. These predictions will be listed below. First, we shall provide the map- 
ping to the substrate, a most crucial constraint for us. The mapping of the reconstruction 
network has been elaborated before Lorincz et al., 2002b| . It is reviewed here for the sake 
of completeness. 



4.1. Matching the anatomy. The neocortex is made of six sub- layers (Fig. E|. The 
figure depicts the most prominent connections between these sub-layers |Lund, 1988| . In- 
put typically arrives at layer IV. Layer IV neurons send messages to layer II and layer III 
(not shown). Furthermore, layer IV neurons send messages also to layer VI. Superficial 
neurons provide output down to layer V and VI. There are connections between neurons 
of layer V and layer VI. Neurons of layer II and III are also strongly connected. Layer V 
provide feedback to layers II and III. The main output to higher cortical layers emerges 
from layers II and III. The main feedback to lower layers is provided by layer V. (For a 
review see, e.g., Callaway, 2000| .) 



to higher layers 




from lower layers 



to lower layers 



Figure 7. Neocortical layer 



The theoretical model and the anatomical structure can be matched by assuming 
that reconstruction networks are laid between neocortical layers as it was denoted by the 
dark gray areas in Fig. EfA) [Lorincz et al., 2002b| . According to this figure, superficial 
layers of the lower cortical layer and deep layers of the higher cortical layer form one 
functional unit, the reconstruction network. 

The novel anatomical suggestion of this paper is the mapping of a robust controller 
to the cortical layers (denoted by light gray boxes of Fig. |HfA)). The reconstruction error 
and the noise filtered temporal integral of this error, that is the reconstructed input exerts 
control action on lower reconstruction networks that corresponds to the information sent 
from superficial layers to deep layers. Experienced quantities correspond to the information 
propagating into the other direction, i.e., experienced quantities are sent to superficial 
layers by layers IV and V. 
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Mapping of the top to the EC-HC loop is shown in FigEt^B). Independent com- 
ponent analysis is executed in two steps: (i) whitening (CAS subfield of HC) and (ii) 
separation (CAl subfield of HC). Reconstruction error and reconstructed input are com- 
puted in EC layer II and EC layer HI, respectively. Internal (hidden) representation is 
encompassed by EC layers V and VI. These deep layers perform the pattern completion 
task. The EC afferents of the CAl subfield can release the gates of the CAl outputs and 
denoised signals can pass. The recurrent collaterals of the CA3 subfield replay learned se- 
quences. Blind source deconvolution (BSD) is the putative role of the dentate gyrus. BSD 
removes temporal correlations cumulated by temporal integration of not-yet tuned lower 
reconstruction networks. BSD is necessary for proper ICA analysis. However, the number 
of neurons of a BSD structure can be very large. It is assumed that this computation can 
be afforded only at the top of the hierarchy. In turn, the dentate gyrus plays an unique role 
in our model. The assumption gains further support from top-down controlling: control 
can accelerate reconstruction Lorincz, 1998 . In turn, BSD at lower networks may not be 



necessary. This point deserves further investigations. More details can be found elsewhere 
[Lorincz and Buzsaki, 2000 . 



4.2. Predictions of the model. Lorincz and colleagues have shown previously that some 
memory effects, such as repetition suppression and priming Lorincz et al., 2002c as well 



as the particular properties found for Alzheimer patients in a classic prototype learning 
paradigm (see, e.g., [Knowlton, 1999| and references therein about '9 dot' experiments) 
can be explained by the reconstruction network model [Keri et al., 2002' 



A crucial prediction of our model is that temporal convolutions should be removed 
before ICA occurs. Given that ICA is the putative role of the CAS and CAl subfields, 
only the dentate gyrus can be responsible for this task. In turn, long and tunable de- 
lay lines should exist in the dentate gyrus. This prediction has been reinforced recently 
Henze et al., 2002| . 



Another falsifying prediction of the model concerns the internal representation 
layer, which has to maintain its own activities in order to enable additive corrections and 
temporal integration. Persistent activities in the deep layers but not in the superficial 



layers of the EC have been found experimentally Egorov et al., 2002 . 

An intriguing and falsifying prediction of the joined model, alike to its previous ver- 
sions [Lorincz and Buzsaki, 20*00 , is that top-down connections of reconstruction networks 



can be interpreted as long-term memories (LTMs), because these connections are responsi- 
ble for the relaxed activities of the hidden layers. Given our mapping, the LTM corresponds 
to feedback connections between neocortical areas. These connections are generally more 
numerous than the feedforward connections between the same areas but the activity flow 
along these connections is relatively low and suggests a weak functional role (see, e.g., 
[Callaway, 2000 and references therein). This apparent discrepancy may be resolved by 



noting that different interpretations may coexist in the brain as it has been made evident in 
the animal experiments on binocular rivalry (see, e.g., Leopold and Logothetis, 1999 and 



references therein) and in experiments with several possible visual interpretations (see, e.g., 
[Leopold, 2003[ [Parker and Krug, 2003 and the cited references) . If reconstruction con- 



cerns a single interpretation then feedback activity flow should be small. This possibility 
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can not be excluded because of the following reasons. There are evidences that activities 
in V4 (responsible for conscious detection of colors) and V5 (responsible for conscious 
detection of fast motion) in the monkey are uncorrelated. According to the arguments 
put forth by Zeki |Zeki, 2003 , uncorrelated activities indicate that conscious experiences 
propagate downwards along parallel channels. Moreover, the conscious binding of the re- 
sult of the individual conscious experiences seems to be delayed [Bartels and Zeki, 2002 



In turn, it is possible that only one interpretation is communicated downwards at a time. 
Another point concerns the suggested function, that the internal representation is under 
top-down control. This controlled representation will then propagate downwards to form 
the reconstructed input and also directly to real control networks [Diamond, 1979 . Given 



the long delays of processing, a well tuned control system should not interact oftentimes. 
In turn, the assumed function involves relatively sparse information flow. 

It is important to note that in the original control model of the EC-HC loop 
Lorincz, 1998| the dentate gyrus was suggested as the source of temporal integra- 



tion. Only the making of the reconstruction network model Lorincz and Buzsaki, 2000 



Chrobak et al., 2000| revealed that temporal integration has to be executed at the hidden 



layer and not at the dentate gyrus, whereas temporal convolutions produced by tempo- 
ral integrations accomplished in lower networks can be removed by means of the BSD 
algorithm at the dentate gyrus. 

According to recent measurements, awareness and attention needs to be distin- 
guished (for an excellent review, see [Lamme, 20031 ). Attent ion increases neuronal activ- 



ities responsible for the processing of the attended stimuli Desimone and Duncan, 1995 



Most probably, endogenous attention facilitates the pathways that should be used by 



the attended stimuli Egeth and Yantis, 1997 . In our model, facilitation can manifest it- 



self through control action within cortical layers. On the other hand, awareness involves 
recurrent interactions between areas and can be suppressed by backward masking (see 



Lamme, 2003 and references therein). This recurrent interaction required for awareness 



is our candidate function of the feedback connections between cortical areas. 

Another note concerns top-down pattern completion: The controller network com- 
bines bottom-up information from different columns, areas and modalities and develops 
the context for lower level internal representation. Control action then corresponds to 
context based pattern completion. The efficiency of strong top-down control, such as 
overwriting, has been the subject of computer studies [Lorincz et al., 2002b 



The merging of the two kinds of comparator networks solves the noise sensitivity 
of the controller. This noise filtering is fast and optimal, and it is an emerging property 
in the joined architecture. 

According to the model, the EC deep layer to EC superficial layer synapses form 
the long-term memory of the EC-HC loop. On the other hand, the HC plays a particular, 
though not unique role. The HC is the top comparator that turns reconstruction error 
into control signal. This control signal excite the neurons of the EC deep layers. This is a 
unique position to encode inputs into the synapses between EC superficial and deep layers 
by Hebbian means. Similar roles can be played by all top-down control signals at other 
levels. These control signals target the deep layers and similarly to the HC output, they 
may enable (or facilitate) Hebbian learning of the LTM. Moreover, the control signal may 
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propagate from the higher reconstruction networks to lower ones and encoding at higher 
reconstruction networks may influence encoding at lower ones, too. 



Our model, alike to its previous versions Lorincz, 1998 Lorincz and Buzsaki, 2000 



Lorincz et al., 2002b| is neither a model for episodic learning, nor a model for incremental 



learning and does not fit such traditional distinctions (see, e.g., |Gluck et al., 2003] ). On 
the one hand, when information maximization is not modulated by behavioral relevance, 
the model is an incremental model engaged in the maximization of information transfer, 
noise filtering and pattern completion using information theoretic algorithms in Hebbian 
forms [Lorincz et al., 2002b . On the other hand, the controller is a top-down tool, which 



can facilitate learning and could make learning instantaneous, if behavioral relevance re- 
quires: For a given input, and by activating a unit of the hidden layer, Hebbian learning 
will make that unit of the hidden layer to represent (encode) the actual input by means 
of its top-down synapses. Such mechanism can shortcut statistical analysis and may im- 
print the input into the internal representation. According to our model, the increased 
learning rate in deep layer afferents of the superficial layers corresponds to the supervisory 
instruction for the activated deep layer unit(s): Remember to the actual input! 

The joined model offered no role for the recurrent collaterals of the superficial 
layers. We believe, that our continuous model can not uncover the role of these con- 
nectivity structures. Another missing feature of the neocortical structure is its columnar 
organization. The continuous comparator model does not seem to offer any clue here. 

Finally, we note that the forming of invariant place cells from retinal input irre- 
spective of the motion of eyes, head and body and their learned and optimized joined or 
disjoined motion patterns corresponds to a plant of very high order. Our rate code model 
justifies that invariant representations of place cells, the behaviorally important compo- 
nents of problem solving in mazes, are represented the hippocampus in rats and that the 
information of different modalities are associated here. (For a review, see, IRedish, 1999] ). 



5. Conclusions 

There is a large body of experimental data supporting the idea that attention 
shapes (influences, controls) perception. For excellent reviews, see, e.g., [Duncan, 19"99l 
Posner and DiGirolamo, 2000[ [LaBergeT^ OOO and references therein. We have presented 
a unified model that optimizes bottom-up information transfer and filters (attenuates, 
prohibits) the propagation of structureless noise. The model also influences top-down 
processes, by comparing desired and experienced parameters in sensory information pro- 
cessing. The unification of the control model [Lorincz, 1998[ and the reconstruction archi- 
tecture model [Lorincz and Buzsaki, 2000] leads to falsifying predictions. Some of those 
predictions have gained experimental support recently. For example, the model predicts 
persistent activities in the deep layers of the entorhinal cortex that have been found in the 
experiments [Egorov et al., 2002[ . Another falsifying prediction, that the circuitry of the 
dentate gyrus should support long delays, has also been reinforced [Henze et al., 2002] . 
A most intriguing prediction of the model is that the long-term memory of neocortical 
visual areas corresponds to the feedback connections between areas. These feedback con- 
nections are more numerous than the bottom-up connections between areas. However, 
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these feedback connections are relatively quiet. The model allowed us to distinguish be- 
tween attention and awareness, two delicate and intertwined concepts. Known features of 
awareness allowed us to argue about the relative quietness of feedback connections between 
areas: there should be only one available representation for awareness, whereas multiple 
interpretations should coexist in representations not directly related to awareness. We 
have argued that this interpretation fits recent physiological findings. 
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